SVM-Based Spam Filter with Active and Online Learning
نویسندگان
چکیده
A realistic classification model for spam filtering should not only take account of the fact that spam evolves over time, but also that labeling a large number of examples for initial training can be expensive in terms of both time and money. This paper address the problem of separating legitimate emails from unsolicited ones with active and online learning algorithm, using a Support Vector Machines (SVM) as the base classifier. We evaluate its effectiveness using a set of goodness criteria on TREC2006 spam filtering benchmark datasets, and promising results are reported.
منابع مشابه
Ideas and Applications on Support Vector Machine Active Learning
For most algorithms we studied from our machine learning course CS545 [1], we choose training samples randomly from a large pool of labeled data, which means we know the sample classes in advance while constructing the training data set. While there is another option for selection training data: pool-based active learning, which is first introduced by Lewis and Gale in 1994 [5]. The learner can...
متن کاملOnline Active Learning Methods for Fast Label-Efficient Spam Filtering
Active learning methods seek to reduce the number of labeled examples needed to train an effective classifier, and have natural appeal in spam filtering applications where trustworthy labels for messages may be costly to acquire. Past investigations of active learning in spam filtering have focused on the pool-based scenario, where there is assumed to be a large, unlabeled data set and the goal...
متن کاملActive Learning Image Spam Hunter
Image spam is annoying email users around the world. Most previous work for image spam detection focuses on supervised learning approaches. However, it is costly to get enough trustworthy labels for learning, especially for an adversarial problem where spammers constantly modify patterns to evade the classifier. To address this issue, we employ the principle of active learning where the learner...
متن کاملRelaxed Online SVMs in the TREC Spam Filtering Track
Relaxed Online Support Vector Machines (ROSVMs) have recently been proposed as an efficient methodology for attaining an approximate SVM solution for streaming data such as the online spam filtering task. Here, we apply ROSVMs in the TREC 2007 Spam filtering track and report results. In particular, we explore the effect of various slidingwindow sizes, trading off computation cost against classi...
متن کاملAdaptive Spam Filtering Using Only Naive Bayes Text Classifiers
In the past few years, machine learning and in particular simple Naive Bayes classifiers have proven their value in filtering spam emails. We hereby put Naive Bayes filters to the test, against potentially more elaborate spam filters that will participate in the ceas 2008 challenge. For this purpose, we use the variants of Naive Bayes that have proven more effective in our earlier studies. Furt...
متن کامل